The Annals of Statistics 

2007, Vol. 35, No. 6, 2620-2638 

DOI: 10.1214/009053607000000361 

© Institute of Mathematical Statistics. 2007 

TESTING THE SUITABILITY OF POLYNOMIAL MODELS IN 
ERRORS-IN- VARIABLES PROBLEMS 

By Peter Hall and Yanyuan Ma 

Australian National University, and University of Neuchatel 
and Texas A&M University 

A low-degree polynomial model for a response curve is used com- 
monly in practice. It generally incorporates a linear or quadratic func- 
tion of the covariate. In this paper we suggest methods for testing the 
goodness of fit of a general polynomial model when there are errors 
in the covariates. There, the true covariates are not directly observed, 
and conventional bootstrap methods for testing are not applicable. 
We develop a new approach, in which deconvolution methods are 
used to estimate the distribution of the covariates under the null hy- 
pothesis, and a "wild" or moment-matching bootstrap argument is 
employed to estimate the distribution of the experimental errors (dis- 
tinct from the distribution of the errors in covariates). Most of our 
attention is directed at the case where the distribution of the errors in 
covariates is known, although we also discuss methods for estimation 
and testing when the covariate error distribution is estimated. No 
assumptions are made about the distribution of experimental error, 
and, in particular, we depart substantially from conventional para- 
metric models for errors-in-variables problems. 

1. Introduction. Suppose we observe independent pairs (Wi,Y±), . . . , (W n 
Y n ) distributed as (W,Y), where 

(1) Y = g{X)+e, W = X + U, E(e\U,X)=0, 

and U is independent of X and has zero mean. The particular model of 
interest is that where g is a polynomial, 

(2) //•:,'•; 

j=o 
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Here, for < j < p, 0® denotes the true value of a parameter (5j . 

Our main purpose in this paper is to suggest ways of assessing the good- 
ness of fit of the polynomial model. We shall treat the goodness-of-fit prob- 
lem as one of testing the null hypothesis that g{x) can be expressed as in 
(2), for values of x that lie in the support of the distribution of X. Note that 
in the absence of measurement error, that is, if we could observe X, then 
this problem could be solved readily by using, for example, the test statistic 
and its properties developed by Fan, Zhang and Zhang [14]. However, the 
presence of the measurement error complicates the problem, and we are not 
aware of an existing method in that case. 

A related problem has been treated by Cheng and Kukush [4]. There, an 
ingenious, asymptotic squared-difference goodness-of-fit test is suggested, 
based on a statistic which, under the null hypothesis, has a limiting chi- 
squared distribution with one degree of freedom. However, it is readily seen 
that, while the Cheng and Kukush [4] test has good power properties against 
some alternatives, it has zero power against many others. Intuitively, this is 
because the test addresses only one mode of a potentially infinite number of 
modes of departure from the null hypothesis of a polynomial fit. That single 
mode, or single component in the infinite class of components that are all 
orthogonal to the class of all polynomials of degree p, is responsible for the 
single degree of freedom in the test of Cheng and Kukush [4]. 

By way of contrast, the test proposed in the present paper addresses 
simultaneously the infinity of components that can define departure from 
the class of polynomials of degree p. In this setting the limiting distribution 
of any test statistic will be relatively complex, and an asymptotic test will 
not be feasible. We suggest instead a bootstrap method for calibrating the 
test and producing critical points. 

However, bootstrap methods in this problem are necessarily quite non- 
standard. Indeed, the bootstrap is seldom used in the context of errors in 
variables, since neither the explanatory variables X nor the errors e can be 
directly accessed. At best, only their distributions can be estimated, and 
so the bootstrap cannot proceed by resampling either observed or imputed 
data, such as residuals. 

Quite different methods are required for estimating the distributions of 
X and e, as a prelude to applying the bootstrap. From some points of view, 
estimating the distribution of X is the simpler of the two tasks; that prob- 
lem is one of conventional deconvolution, in which, given the distribution 
of U, we wish to estimate the distribution of X from data on W = X + U . 
However, it can be shown theoretically that, unless the distribution of U 
is especially unsmooth, the distribution function of X cannot be estimated 
root-n consistently. For example, if the characteristic function of the distri- 
bution of U decays like \t\~ a as \t\ — > oo, where a > 0, then it can be proved 
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that a necessary condition for root-n consistency to be achievable is that 
a < This constraint denies even a single derivative to the density of U. 

Therefore, the distribution of X seldom can be estimated root-n consis- 
tently, and so bootstrapping that variable presents challenges. Estimating 
the distribution of e is an even more awkward problem. However, a care- 
ful examination of theoretical issues shows that the limiting distribution of 
our test statistic depends on properties of e only to the extent of vare, and 
so the moment-matching, or wild, bootstrap is feasible for estimating the 
distribution of experimental error. 

In summary, key contributions of this paper are constructing a statistic 
for testing model adequacy in the context of polynomial models with mea- 
surement error; proposing a nonstandard bootstrap method for assessing 
the distribution of the test statistic under the null hypothesis; and showing 
how to use repeated measurements when the measurement-error distribu- 
tion is unknown. Innovations include the novel form of the test statistic, 
the unconventional way in which it is computed, using both deconvolution 
and wild-bootstrap techniques, and our theoretical derivation of properties 
of the test. 

Various forms of (1) have been studied in the literature. Most of the work 
focuses on a parametric model framework, where a parametric form of the 
distribution of e given U and X is adopted, typically being normal. When 
g(X) is linear, extensive research can be found in Fuller [17], and the efficient 
estimator was given by Bickel and Ritov [1]. The same efficient estimator 
was also discovered in a broader generalized linear model framework by 
Stefanski and Carroll [27]. The extension of g(X) to a general polynomial 
was first studied in Chan and Mak [3], where a root-n consistent estima- 
tor was constructed. Their work was later further extended by Cheng and 
Schneeweiss [5] and Cheng, Schneeweiss and Thamerus [6] . A comparison of 
several methods is given by Kukush, Schneeweiss and Wolf [21]. 

A review and study of a class of estimators can be found in Taupin [29]. 
Consistent and efficient estimators for a general function g were recently 
constructed by Tsiatis and Ma [30]. Estimators proposed in Bickel and Ritov 
[1] and Cheng and Schneeweiss [5] also apply when a distributional model 
for (e\U, X) is not assumed, hence their model is in fact semiparametric. A 
further extension from this semiparametric model framework is to consider 
a partially linear model through replacing g{X) by X(3 + 0(Z), where (3 
is an unknown parameter and 6(Z) is an arbitrary unknown function of 
some observable covariates Z. Estimators in this setting were proposed by 
Liang, Hardle and Carroll [23]. When no functional form is assumed for 
g(X), the model becomes nonparametric. Estimators and their properties 
are studied in Fan and Truong [13] and Efromovich [10, 11]. Recent work on 
the moment-matching bootstrap includes that of Fan and Li [15], Flachaire 
[16], Dommguez and Lobato [9], Praskova [26], Kauermann and Opsomer 
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[20], Li, Hsiao and Zinn [22] and Gonzalez Manteiga, Martinez Miranda and 
Perez Gonzalez [18]. 

The distribution of U in (1) might be known, or it might be estimated 
directly or from replicated data on W. To focus on the main problem, we 
assume first that the distribution of U is known, and treat subsequently, in 
Section 5, the case where it is unknown. 

2. Methodology. 

2.1. Methodology for estimating fio, . . . , (3 P . Because various methods for 
estimating (3 already exist, we only briefly outline the estimator that is 
used in this paper. Let xo,xi, ... be real numbers with xo ^ 0, and define 
recursively functions Pj of j + 1 variables by Pq(xq) = > ana ^ 

x Pj(xo,...,Xj) = ~Y^ y J k J Xj- k P k (x ,...,X k ), j > 1. 

Given a random variable Z, define Hj(Z) = E(Z J ). From the data (W k , Yk), 
construct estimators of aj = Hj(W) and Aj = E(YW J ) using aj = n~ l J2 k ^ k 
and Aj = n~ 1 J2k YkW 3 k , respectively. Define bj = Hj(X) and Bj = E(YX'j), 
and put Uj = fij(U), a known quantity. It can be shown that 

b i = Z~2 ( l ) a i-k p k(vo, ■ ■ -,Vk), 



k=0 

J 



B 3 = zZ ( i ) A 3-kPkiyO, Vk)- 



. k 

k=0 

Hence, under moment assumptions, root-n consistent estimators of bj and 
Bj are 

j 



k=o 
(3) 



h = zZ[t) aj-kPk(vo,...,i/ k ), 

l — n V / 



k=0 



An estimator of the true values (3° = (/?o, • • • , Pp) T is given by 
(4) (3 = M~ X B, 

where B = (Bo, . . . , B P ) T , M = (rhjk) is a (p + 1) x (p + 1) matrix, and 
fhjk = bj +k for < j,k <p. It can be proved that $ is root-n consistent for 
j3 and is asymptotically normally distributed, provided (1) and (2) hold, 
E(W Ap ) + E(e 2 ) < oo and the distribution of X has a nondegenerate con- 
tinuous component. 
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2.2. Hypotheses and test statistic. Consider the problem of testing the 
null hypothesis Hq = Hq(p), that g in the model (1) is given by (2) for 
an appropriate choice of (3o,...,(3 p , against the complementary alternative 
Hi(p). Since we have access to information about g(x) only when x is in 
the support of the density fx of X, then Hi(p) should have the form: g 
is not equal almost everywhere on supp/x to a polynomial of degree p. 
Equivalently, H\(p) is characterized by the class of functions g that are 
bounded on compact intervals and satisfy 

inf r°|^)-<^|/3)| 2 ^>0 
/3o,—,P P J-t 

for each to > 0, where 

m = E{g(X)e ltx }, <f>(t\0)-- 

and i = y/—l. In defining ip and cj) we assume that + < oo. 

These considerations suggest that we base our test on the statistic T(f3), 
where 

(5) T(P)= [m)-0(t\(3)\ 2 Wl (t)dt; 




ip(t) and <j){t | (3) are root-n consistent estimators of ip{i) and 4>(t \ /?), respec- 
tively; (3 denotes either (3, defined at (4), or an alternative estimator, such 
as argminT(/3); and w\ > is a known weight function. 

For computational simplicity, we shall take (3 = (3. Unbiased estimators of 
ijj(t) and cj)(t | (3) are given by 




(6) 

0(t|/3) 

where Dt = d/dt is the differentiation operator, and is the characteristic 
function of U, or equivalently, the Fourier transform of fjj. For these choices 
of (3, i\) and 4>, our test amounts to rejecting Hq if the statistic S = T(f3) is 
too large. We shall use bootstrap methods to determine a critical point for 
the test. As a prelude to that step, we require estimators of the distributions 
of X and e. 

In order to remove the function fff from denominators in (6), it is con- 
venient to take w\ = (/^ t ) 2 w, where w is another weight function. This 
produces the statistic 

T{(3) = [ \m - 4>{t I P)\ 2 f^(t) 2 w(t) dt. 
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2.3. Estimator of distribution of X . The distribution of X is accessible 
using conventional deconvolution methods, as follows. Given data W\, . . . , W n 
on W = X + U , a kernel estimator of the density fx of X is given by 

1 „(x- W. 
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fx(x) = fx(x\h) = — Y / L 

3=1 

where 

L(u) = ± [e^ KFt{t) dt 
{ ) 2vr 7 e /«(t/A)*' 

is a kernel function (in particular, a function which integrates to 1), K Ft 
denotes the Fourier transform of K and h > is a smoothing parameter. 
See, for example, Carroll and Hall [2], Stefanski and Carroll [28] and Fan 
[12]. 

Integrating fx, we obtain an estimator Fx of the distribution function 
Fx of X, 



F x (x)=F x (x\h)= f fx(u)du = -J2Li(x-Wj), 



(7) LM = - + —\ — ^Wti^^ 



'i 

'3=1 

where L\(hu) = f v<u L(v) dv, or equivalently, 

1 If 00 smtuK Ft (ht 

2 + 2^J- 00 ~t fj*(t) 



provided that K Ft (ht) / fy l (t) is real valued. In Section 3 we discuss choice 
of K and h. 

Next we convert Fx to a distribution function Fx by defining first Fx{x) = 
max u < x Fx (u) and then 

(8) Fx(x)- Px{x) - Px{Cl) 



F x {c 2 ) - F x {a) 



if ci < x < C2, Fx{x) = if x < ci, and Fx(x) = 1 if x > C2, where ci < C2 
are constants. 



2.4. Estimator of distribution of e. Conventional deconvolution meth- 
ods can be used to estimate the distribution of e when p = 1, although 
they are awkward to implement; and they fail for p > 1. Fortunately, sat- 
isfactory accuracy can be obtained using a simpler, moment-matching or 
"wild" bootstrap approach. To this end, let u> r = E(e r ), for integers r > 1, 
and note that u\ = 0; let cD r , for r > 2, denote respective estimators of w r ; 
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let G(- 1 K2, ■ ■ ■ , K q ) be a known distribution with zero mean and moments 
/ x r dG(x) = K r , for 2 < r < q, where q>2; and put 

(9) F £ = G(-\Q 2 ,...,Q q ). 

This estimator is generally not consistent for F e , but it is adequate for our 
purpose. Examples of the distribution G will be given in Section 3. 

In Section 4 we shall show that the asymptotically correct level for the 
test is achieved by taking q>2. Although q = 2 is sufficient, accuracy can 
be improved by using q = 3, or, in the case of near symmetry, fitting a 
distribution with first and third moments equal to zero and second and 
fourth moments equal to u) 2 and 0)4; see Section 3. The moment-matching 
bootstrap could also be employed to estimate the distribution of X, but 
there we require q > Ap. 

Next we define estimators cD r . Observe that 

u r = E(e r ) = E(Y r ) - E Q E{e s )E (j^ 

(10) ^ 

= E(Y r ) - £ ,, , H ■ • • 

s=0t +-+t p =r-s SX °- " ' l P- 

where the second summation is over integers to, . . . ,t p >0 such that to + • • • + 
t p = r — s, and, since -E'(e) = 0, we may exclude from the first summation 
in (10) the term corresponding to s = 1. Results (3) and (4) give us root- 
re consistent estimators bj and (5j of bj = E{X 3 ) and 0j, respectively, and 
we can readily compute Y r = n Y^jYj \ an unbiased estimator of E(Y r ). 
Therefore, having constructed estimators Q\ = 0, 0)2, . . . ,cD r _i, we define u) r 
recursively by 

s= 0t +-+t p =r-s S -H)----V 

2.5. Implementing the bootstrap test. Our bootstrap method has six steps, 
as follows, (a) Compute the estimators /3o, ■ ■ ■ ,P P suggested in Section 2.1, 
and the distribution estimators Fx and F £ suggested at (8) and (9). Cal- 
culate the test statistic S = T{(3) from (5). (b) Draw data X(, . . . ,X* from 
Fx , £*,...,£* from F e and U{ , . . . , U * from the distribution of U, and put 
9(x) = Eo<j< P ^, Y* = g(X*) + e* and W* = X* + U*, for 1 < j < n. 
(c) Using the data pairs (W*, Y?) in place of (Wj,Yj), compute the estimator 

(3* = (4q, • • • , Pp) T of p = (A), . . . , (3 P ) T . (d) Compute the analogue T*{[3) of 
T{0) defined at (5), using (W*,Y*) instead of (Wj,Yj), and form the statis- 
tic S* =T*(f3*), the bootstrap analogue of S = T((3). (e) Using repeated 
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Monte Carlo simulation, approximate the distribution of S* conditional on 
the data V = {{Wi,Y\), . . . , (W n ,Y n )}, and in particular, approximate the 
critical point s a such that P(S* > s a \ V) = 1 — a. (f) Reject Hq(p) in favor 
of Hi (p) at the nominal level a if S > s a . 

3. Computational issues and numerical results. 

3.1. Choice of K and h. Generally, K is selected so that K Ft vanishes 
outside a compact interval. A popular choice is 

(11) K(x) = 48(cosx)(l - lSx" 2 )^ 4 ) -1 - 144(sinx)(2 - SaT 2 )^ 5 )" 1 , 

for which K Ft (t) = (1 - t 2 ) 3 if \t\ < 1 and K Ft (t) = otherwise. See, for 
example, Delaigle and Gijbels [7, 8]. 

In such cases, L\(u) is well defined by (7) and finite for each u, provided 

(12) f Ft is real-valued and does not vanish on the real line. 

This is a common assumption in deconvolution problems, and while it can 
be circumvented, we shall use models for which it holds. 

As a prelude to bandwidth choice, we note that if (12) holds and f Ft (t) ~ 
Ct~ a as t —y oo, with a > |, and if K is as at (11), then 

/oo _ 
E{F x {x | h) - F x {x)} 2 dx = Cin^h l ~ 2a + C 2 /i 4 + o{n^h l ^ 2a + /i 4 ), 
-oo 

where d = C 2 k{u)/ii, C 2 = 9j(f' x ) 2 dx and k{o) = j t>Q t 2a - 2 K Ft (t) 2 dt. 

Delaigle and Gijbels [8] suggested methods for estimating Jx = J {fx) 2 dx, 
and hence, for approximating The simplest of their techniques is a 
"normal reference" approach, analogous to bandwidth choice in density 
estimation by comparison with the normal distribution. Specifically, fx 
is taken to be a normal N(0, a\) density, where a\ =varX and is esti- 
mated by a\ , equal to the empirical variance of the data W\ , ■ ■ ■ , W n , minus 
the known variance of U. If X were normally distributed, then Jx would 
equal (4-7t 1 / 2 (T^ : ) _1 . Therefore, we take Jx = (47r 1//2 o"^-) _1 , and so our esti- 
mator of C 2 is C'2 = 9/(4ir 1 / 2 cr x ). Finally, we compute n(a), and then C\, 
using the known value of a, and choose h to minimize C\n" x h x ~ 2a + C2/1 4 . 

3.2. Choice of the distribution G = G(- 1 L02, ■ ■ ■ ,w q ). The simplest case 
is that where q = 2, in which instance one would generally take G to be 
the normal N(0, 0J2) distribution. Two examples of distributions G that are 
suitable when u\ = = and q = 4 are the three-point distribution defined 
by 

(13) P(Z = 0) = 1 - vr, P(Z = ±7r" 1/2 ) = ivr, 
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where < it < 1, and the Student's t distribution. The three-point distri- 
bution can be used to capture any pair (u)2, W4), regardless of the sign of 
kurtosis. The Student's t distribution can capture only (0^2,^4) for which 
kurtosis is positive; however, the positive sign is the more common in prac- 
tice. 

The distribution at (13) has E(Z) = 0, E(Z 2 ) = 1 and E{Z 4 ) = vr" 1 . 

Therefore, if it = uj^/uja, then the distribution of u^ 2 Z is symmetric with 
variance and fourth moment equal to 0J2 and W4, respectively. To implement 
the moment-matching method in this setting, one replaces u>2 and by their 
respective estimates, 0)2 and D4, discussed in Section 2.4; takes the estimator 
of 7r to be 0)2/0)4; estimates the distribution of Z, at (13), by replacing it 
by this estimator; and, if Z has this estimated distribution, takes the dis- 
tribution G (our surrogate for the distribution of e) to be that of uj 1 / 2 Z . In 
some instances it is not possible to reliably estimate high-order moments, 
and there only low-order moments are fitted. For example, in the Alaskan 
Earthquake example in Section 3.4 we fit the normal N(0,u;2) distribution. 

The convergence rate of the wild bootstrap typically improves when higher 
moments are fitted in additional to the first two moments, as discussed by 
Liu [24], Mammen [25] and Hardle and Mammen [19]. The model that they 
consider has the form Y{ = g(Xi) + et, where e% is not necessarily identically 
distributed. When g is linear, Liu [24] shows analytically that the second- 
order properties of the wild bootstrap are obtained when the third moment is 
fitted, due to a correction of a skewness term in the Edgeworth expansion of 
a sampling distribution. Following these results, in the following simulation 
studies we implement the three-point distribution which fits the first four 
moments. 

3.3. Simulation results. We conduct two simulation studies. In the first, 
we generate data from the linear errors-in- variables model Yi = (3\Xi + (3q + 
£j, with Wi = Xi + Ui, i = 1, . . . ,n. The latent variables X{ are generated 
from a uniform distribution in [—3,4], and the experimental errors £i come 
from a normal distribution with mean and variance 1. We generate the 
measurement errors Ui from two different distributions: a normal N(0, 1) 
distribution and a Laplace distribution with variance 0.5. In each parameter 
setting we simulate 2000 datasets, for various sample sizes n, and use boot- 
strap resample size 100. In the first simulation study, datasets are generated 
under Hq. 

The purpose here is to assess the level accuracy of the test. Results are 
given in Table 1. The two different measurement error distributions in the 
upper and lower halves of the table correspond to two approaches to choosing 
h. In the case of normal error, the optimal h is infinity, while for Laplace 
error, a finite value of h is obtained using the strategy described in Section 
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Table 1 

Simulation 1: Level accuracy. In the first block (upper half table), the measurement error 
Ui is normal N(0, 1) and the bandwidth is h — 5.0, which is practically infinite with 

respect to the support of the distribution of X . In the second block (lower half table), Ui 
has a Laplace distribution with zero mean and variance 0.5, and the bandwidth h is 

calculated as suggested in Section 3.1. Each entry in the table is based on 2000 simulated 
datasets. The bootstrap resample size is 100. The very top row of the table gives the 

nominal levels 



n 


5% 


6% 


7% 


8% 


9% 


10% 






Normal measurement 


error 






50 


4.55% 


5.85% 


6.38% 


7.63% 


9.45% 


9.93% 


60 


4.38% 


6.05% 


6.68% 


8.00% 


10.00% 


10.50% 


70 


4.53% 


6.05% 


6.55% 


7.63% 


9.10% 


9.63% 


80 


5.03% 


6.15% 


6.70% 


7.75% 


9.20% 


9.73% 


90 


4.33% 


5.60% 


6.13% 


7.28% 


9.20% 


9.70% 


100 


5.33% 


6.60% 


6.95% 


7.93% 


9.65% 


10.13% 






Laplace measurement error 






50 


4.75% 


6.40% 


6.83% 


7.75% 


9.30% 


10.05% 


60 


5.05% 


6.65% 


7.23% 


8.45% 


10.35% 


10.88% 


70 


4.78% 


6.55% 


7.13% 


8.23% 


9.55% 


10.08% 


80 


4.45% 


6.30% 


6.83% 


7.95% 


9.65% 


10.20% 


90 


5.25% 


6.50% 


7.18% 


8.50% 


9.75% 


10.40% 


100 


5.05% 


6.70% 


7.15% 


8.15% 


9.90% 


10.65% 



3.1. As can be seen from Table 1, even in this simple model and even for 
very small sample sizes n = 50 to n = 100, the rejection levels under the null 
hypothesis are close to the desired levels for both error types. We repeated 
the simulations with bootstrap resample size 200 and obtained very similar 
results (not reported here). 

The second simulation study addresses power. Here we generate datasets 
from the model Yi = (5\Xi + (3q + ccos(Aj) + £j, with Wi = Xi + Ui and 
c = 1.5 a constant. (The first study used the same model but with c = 0. 
The distributions of e,, U and Xi are as in the first study.) We again take 
bootstrap resample size to be 100, but this time we calculate the power of 
the test at different levels. The results for different sample sizes are given 
in Table 2. We can see that as sample size increases, so too does the power. 
In this experiment, sample size n = 80 can already achieve 90% power at 
level 5%. For n = 100, power increases to 95%, which is usually sufficient in 
practice. In simulations with bootstrap resample size 200 we obtained very 
similar results. Hence, bootstrap resample size 100 seems adequate. 

3.4. Alaskan Earthquake example. We implement our method on the 
Alaskan Earthquake data studied by Fuller ([17], Chapters 1 and 4). In this 
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Simulation 2: Power. 


Parameter 


settings are as for 


Table 1 




n 


5% 


6% 


7% 


8% 


9% 


10% 






Normal measurement error 






50 


71.98% 


75.90% 


76.65% 


78.05% 


79.80% 


80.40% 


60 


78.78% 


81.60% 


82.65% 


84.20% 


85.75% 


86.30% 


70 


84.80% 


87.25% 


87.93% 


89.08% 


90.15% 


90.68% 


SO 


90.63% 


92.30% 


92.80% 


93.68% 


94.60% 


95.03% 


90 


93.15% 


94.60% 


94.90% 


95.48% 


96.30% 


96.48% 


100 


95.93% 


96.85% 


96.95% 


97.15% 


97.55% 


97.73% 






Laplace measurement error 






50 


76.93% 


79.80% 


80.70% 


82.40% 


84.65% 


85.28% 


60 


85.55% 


87.85% 


88.50% 


89.78% 


91.00% 


91.35% 


70 


91.08% 


92.55% 


93.03% 


93.73% 


94.45% 


94.70% 
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93.88% 


95.05% 


95.33% 


95.80% 


96.50% 


96.58% 
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95.80% 


96.70% 


97.03% 


97.58% 


98.05% 


98.10% 


100 


97.10% 


97.95% 


98.25% 


98.68% 


99.00% 


99.10% 



dataset, the logarithm of the seismogram amplitude of 20 second surface 
waves (Y) and the logarithm of the seismogram amplitude of longitudinal 
body waves (W) of 62 earthquakes are recorded. The main interest is in 
analyzing how the surface wave is related to the longitudinal body wave. 
Of course, both variables are measured with error. Fuller [17] used a linear 
errors-in-variable model, Y = (3q + /3±X + e and W = X + U . Assuming a 
normal N(0, afj) for U, and using extra available information, Fuller [17] 
estimated the measurement error variance to be afj = 0.035, with standard 
error 0.0086. 

We implement a wild bootstrap procedure that matches the first three 
estimated moments of the experimental error distribution. (The fourth mo- 
ment estimate here is too highly variable to be reliable, and, in fact, its point 
estimate is negative.) Taking afj = 0.035, and employing bootstrap resample 
sizes 100, 200, 300, 400 or 500, we find that the resulting p-values are all 
between 54% and 57%. 

Considering that the value of afj is estimated, we also consider two ex- 
treme cases, where afj equals 0.035 ±3.5 x 0.0086 = 0.0049 and 0.065, respec- 
tively, and apply the testing procedure in these cases. The results associated 
with these values of afj and with different bootstrap resample sizes are re- 
ported in Table 3. 

For none of these parameter settings is the reported p-value small enough 
to cast reasonable doubt on the adequacy of the linear model for this dataset. 
Although the different values of afj cause significant changes to estimators 
of (5q and /3±, the evidence in favor of rejecting the null hypothesis is virtu- 
ally nonexistent, in all three cases. This observation reflects the substantial 
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Table 3 

Alaskan Earthquake example: Testing for linearity between the logarithm of the 
seismogram amplitude of 20 second surface waves (Y ), and the logarithm of the 
seismogram amplitude of longitudinal body waves (X). Three different values for the 
measurement error variance ay are considered. The p-value of the test, p{B), is reported 
as a function of the number of bootstrap resamples, B 



° 2 u 


$o 




p(100) 


p(200) 


p(300) 


p(400) 


p(500) 


0.0049 


-1.65 


1.29 


50.0% 


44.0% 


43.3% 


45.0% 


45.4% 


0.035 


-2.81 


1.51 


55.0% 


55.5% 


54.0% 


54.5% 


56.8% 


0.065 


-4.47 


1.83 


70.0% 


72.5% 


72.3% 


73.5% 


75.8% 



variability in the dataset, noted from a different viewpoint two paragraphs 
above. 

Theoretical properties. We shall assume that 

E(W ip ) < oo, < E(e 2 ) < oo, the distribution of X has a nondegenerate 
continuous component and fffi vanishes only at isolated points; 
w(t) > for each t, w(t) converges to zero faster than any polynomial as 

|t|^ooand max ( \D^ fl\ty l \ 2 fl\t) 2 w{t) dt < oo. 

\<k<pj 

In the context of conventional models for the distribution of U, \D^f^(t)\/f^(t) 
is dominated by a polynomial in t, and in such cases the last part of (15) 
follows from the rest of that assumption. 

When implementing the bootstrap we shall assume, in addition to (14) 
and (15), that 

the support of the distribution of X is contained 

within the finite interval (ci,C2), 

where c\ and C2 are fixed and are used in the definition of Fx at (8). 

Of course, the distribution G = G(- \ K2, ■ ■ ■ , Kq) that we employ to estimate 
F £ , at (9), has by definition finite variance if K2 < 00, so we do not impose 
this as a regularity condition. It is not necessary to stipulate whether the 
distribution G is discrete or continuous. 

The main theoretical properties of our estimator are given in the follow- 
ing theorem. There, part (a) describes limit theory under the null hypothesis 
Hq(p), part (b) asserts consistency of the bootstrap estimator of the distri- 
bution of the test statistic under Hq(p), and part (c) shows that the test is 
able to detect a large class of semiparametric, root-n departures from the 



(14) 
(15) 



(16) 
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null hypothesis. It is straightforward to prove a version of part (b) when 
Ho(p) fails; that result requires conditions on a class of g's for which H\(p) 
holds. 

Theorem 1. Assume that the data on which the test is based are gen- 
erated by the model (1). Then: (a) // (14) and (15) hold, and if the null 
hypothesis Hq(p) is valid [i.e., if (2) holds for some choice of the parame- 
ters (3q, . . . ,f3p], then nT{(5) — > £ in distribution, where £ denotes a random 
variable for which P(0 < £ < oo) = 1, and the distribution of £ depends on 
that of e only through vare. (b) If (14)-(16) hold, and Hq(p) is valid, then 
the distribution ofT*{(3*), conditional on the data, converges in probability 
to that o/£. (c) If (14)-(16) hold, and if the function g = g n in (1) is taken 
to depend on n, as 

(17) g(x)=j2PjX J + n~ 1/2 n(x), 

3=0 

where (3q,...,@P and c> are fixed, and the function 7 is bounded, com- 
pactly supported and, on a subset of the support of the distribution of X that 
has nonzero Lebesgue measure, does not vanish and does not equal almost 
everywhere a polynomial of degree p, then 

(18) lim liminf P(S > s a ) = 1. 

c— >oo n— >oo 

The property stated in part (a) of the theorem that the distribution of 
£ depends on that of e only through vare, is the key to the fact that the 
moment-matching bootstrap is adequate for estimating the distribution of 
e when calibrating the test statistic T((3). By way of comparison, the dis- 
tribution of £ depends on that of X through more than just the first two or 
three moments. 

The function g at (17) represents a local departure from the null hypoth- 
esis Hq(p). Indeed, under the latter hypothesis, g would equal just the first 
part of the right-hand side of (17). Result (18) asserts that, in the case of a 
local departure of this form, the test is asymptotically capable of detecting 
the fact that Hq(p) fails. More particularly, for all sufficiently large n, the 
probability that the test correctly detects the fact that Hq(p) is violated 
exceeds 1 — rj, where 77 > can be chosen arbitrarily small by selecting c in 
(17) sufficiently large. 

The assumption that the distribution of X is compactly supported, used 
in parts (b) and (c) of the theorem, is imposed for convenience and can be 
relaxed; we do not do so since we wish to keep the proof and the regularity 
conditions simple. 

An outline proof of Theorem 1 will be given in the Appendix. There the 
distribution of £ will be given. 



14 



P. HALL AND Y. MA 



5. Extension to the case where the distribution of U is not known. It 

is possible to generalize the estimator f3 so that it applies to settings where 
the distribution of U is estimated from data. At least two cases of this 
type can arise in practice. First, we may observe direct data Ui,...,Un on 
U, and from those data we may construct an explicit estimator, fij(U) = 
N- 1 EM - Uy, of ^(U). Here, U = iV" 1 £ fc U k . Replacing v 3 by ^(£7) 
at each appearance in (3), and in all other respects defining f3 as at (4), we 
obtain a new estimator of (3° = (ft®, . . . ,/3p) T . The convergence rate of the 

new estimator is readily seen to be O p {min(n,A r )- 1/2 }. 

Second, and arguably more realistically, we may observe replicated values 
of Wj, so that our dataset is comprised of pairs (Wik, Yik), for 1 < k < iVj and 
1 < i < n, where = Xi + Ufa and = g(Xi) + e^. Here, the variables 
Xi, Uik and En- are assumed to be totally independent. In longitudinal data 
analysis the N^s are usually small, in the range 2 to 5. 

Let us suppose that the distribution of U is symmetric; this would often 
be a reasonable assumption, and should it fail, a modified version of the 
argument below could be employed. Let Si denote the set of A^(iVj — 1) 
distinct pairs (ki, kz) with 1 < ki 7^ &2 < Ni, and put N = J2i< n Ni(Ni — 1). 
We may estimate the moments Hj(V) = Ety 3 ) of the distribution of V = 
Ui + U2, where U\ and U2 are independent copies of U, 

1 " 

MiW = ^E E (w ikl -w ik2 y. 

Of course, fij(V) = if j is odd. Let clt2j and mnt2j denote the functions 
that give the 2jth cumulant, K2j(Z), of a general random variable Z in terms 
of its moments, and the 2jth moment in terms of the cumulants, 

K 2 j(Z) = dt 2 j{H2(Z), ■ ■ .,(J,2j(Z)}, 

ji2j(Z) = mnt 2 j-j>2(Z), . . . , K 2 j(Z)}. 

The 2rth cumulant of the distribution of U equals half the 2rth cumulant 
of the distribution of V, and so we define, in succession, 

K2 j (V) = dt 2j {MV),--.,thj(V)}, 

p2j(U) = mnt 2 j{^K2{V), \&2j{V)}, 

and fij(U) = for odd j. Provided the number of indices i in the range 
1 < i < n, for which Ni>2, increases at rate n, the convergence rate of the 
new estimator is O p {n~ 1 / 2 ). 

Next we briefly address hypothesis testing when the distribution of U is 
not known and it is assumed that (12) holds. We treat in turn the two earlier 
settings. First, if direct data Ui,...,Un on U are observed, then we may 
construct an explicit characteristic-function estimator, = n~ l J2j ettUj ■ 
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(Here and below, fff denotes an estimator of fff, rather than the Fourier 
transform of an estimator fjj of fjj.) We replace fff in (7) by \f^ c \, perhaps 
incorporating a ridge parameter to make the procedure more robust. [Note 
that, assuming (12), ffi = |/^*|.] This gives a new version of Fx, leading 
directly to new formulae for Fx and Fx- In the direct-data setting we do 
not alter the definitions of ip{t) and 4>(t \ (3), except for replacing ffi by 
in the latter. 

Second, if (12) holds and we observe replicated data (Wjk,Yjk), define 



1 n 

-Y 



COB{t(W jkl - W jk2 )} 



j'=i (fci,fca)e5 3 - 



1/2 



potentially incorporating weights to reduce variability. Substituting fff for 
ffi in (7), and modifying i/)(t), <f>(t \ (3) and (3 by incorporating the replicated 
data, we obtain an analogue of T((3) which does not require knowledge of f^. 
One can also develop analogues, in the case where the distribution of U is 
estimated from data, of bootstrap methods for calibration. 



APPENDIX: OUTLINE PROOF OF THEOREM 1 
Define pj = Pj {uq , . . . , Uj ) , 5j = aj — aj , Aj = Aj — Aj , 

Let Ab denote the (p + l)-vector with kth component A^\ and let Am be 
the (p + 1) x (p + 1) matrix with (k\, fejth component <5^ 1+fc2 . Provided (14) 
holds, the matrix M is finite and strictly positive definite. 

In the notation above, B = B + Ab and M = M + Am- Therefore, by the 
Taylor expansion, 

(A.l) (3 = (M + A M y l {B + A B ) = (3° + Q + p (O, 

where Q = (Q(°\...,Q^) T = M- 1 A B -M- 1 A M M~ 1 B is a (p + l)-vector. 
Since Q is expressible exactly as the mean of n independent and identically 
distributed random (p + l)-vectors with zero expected value, it is readily 
proved that n l / 2 Q is asymptotically normally distributed with zero mean 
and finite variance. 

With Wj = Xj + Uj and Yj = g(Xj) + Ej denoting the data, we have 

(A.2) QW^M-^A^-X: ^(M-^AmW^M^B)^ 
e=o e 1= oe 2 =o 
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(A.3) 



ee (t) ( M ~ i )ktPr~ito-- E )\ibfl x j w s^ +e 3 w j~ r \ 

£=0r=0 v 7 j=i U=o ) 

[M- x ) Ux {M- l B) l%Vr 



P V h+l2 

EE E 

f 1= 0£ 2 =0 r=0 



\ r 
1 



-EC 1 -^ 



3=1 



where E denotes the expectation operator. Note too that 

-. n p n i n 

= ^ E ^ = E el- E 4^ + - E - 



p k 



k=0 H j=l 



3=1 



fc=0 «=0 



fc\ / 1 



E^ 

3=1 



ejtw. 



where r (t) = f^(t)(i~ l A) r /y(*) _1 - Define 



In V fFt/-A-l 



(A.4) 



X2(*) = -E e i e 



3=1 



X3fe(*) =E 

£=0 



^E^>-^)- 



3=0 



In this notation, 

m) - kt i = E /?2xi*(t) + X2 (t) 



fc=0 



fc=0 £=0 



^ 3=1 



\(j)k-i{t). 



The series multiplying (/% — in the last term equals X3k(t)- Using (A.l), 



1 1/2 



(A.5) 



£ /^XifcW + X2(t) - E(4 - /3 fc )A3fe(t) 



fc=0 



fc=0 



1/2 



eft 
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+ O p {n~ l ) 

e n /2 +o P ( n 



where 



E/5?Xi*(t) + X2(t)-£Q W X3tW 

fc=0 fc=0 



w(t)dt 



and is as in (A.3). 

Note that X3k(t) — > S,3k(t) as n — > oo, where 

&*(*) = E f JO ^(^e^)0 fe _,(t) = /^X^A)*/!^)- 

< =o v 7 

Using standard properties of sums of independent random variables, and 
referring to (A.3)-(A.4) to deduce the relationships among Xik(t), Xiit) 
and , it may be proved that 

(a.6) < n - e 

in distribution as n — > oo, where 

p p 



E^(*)+6(*)-E fl( *^»(*) 



fc=0 



fc=0 



io(i) (it, 



£io> • • • j£ip; £2 and i?^, . . . are jointly distributed, Cik(t) an d £2^) are 
complex-valued Gaussian processes with zero means, R = (R^\ . . . ,R^P') T 
is a Gaussian (p + l)-vector with zero mean, and the covariances among 
£ik(t), £2^) and R^ are identical to the covariances among 



X 



E 

1=0 



'}\W^{t)\e 



itw 



itw 



and 



P l 



e=or=o 



EE (M-XipJ £ #(1 - + eW 



l-r 



P V «l+«2 

EE E 

£l=0€ 2 =0 r=0 



?l+^2 

r 



+i2— r 



respectively. Note particularly that these covariances depend on the distri- 
bution of e only through the variance of that quantity. 

Part (a) of Theorem 1 follows from (A. 5) and (A.6). The fact that P(£ > 
0) = 1 can be deduced from the fact that vare > 0. Derivation of part (b) is 
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virtually identical to that of part (a). To prove part (c) of Theorem 1, note 
that the presence of the perturbation n _1 ' 2 c7(x) in (17) influences Q^ k \ at 
(A. 2), only by adding n~ l ^ 2 crjik to the first term on the right-hand side of 
(A. 2) [and, hence, by adding the same quantity to the far right-hand side at 
(A. 3)], where 

p £ 

^ = EE(r) (M- x ) H E{j(X)W £ - r }p r . 

£=0r=0 ^ ' 

The impact of the perturbation on ip(t) can be described completely by 
adding n^c^t) to X 2{t), where rj 2 {t) = E{j(Xy tx }f^(t) = E{-f(X)e itw } 
The perturbation has no effect on </>(• | (5). 

Therefore, retracing the arguments leading to (A. 5), we see that result 
continues to hold if we add, within the modulus signs in the definition 
of £ n , the quantity n^^crj^t), where = ^(t) — r]^{t) and rj^t) = 

J2o<k<p r )ikX3k('t) ■ It follows from this property that part (c) of Theorem 1 
holds provided r/3 is nonzero on a set of positive measure. In the next para- 
graph we derive this property. 

The constants r/io, . . . , rji p are the unique solutions of the equations 

EMX)W k } = £ j W k ^, 0<k<p. 

Equivalently, 71 (x) = J2o<j<pVij xJ is the unique pth degree polynomial for 
which E{j(X)W k } = E{-yi(X)W k } for < k < p. From this property, and 
the fact that fff vanishes only at isolated points and 

m (t) = E{ 7 (X)e uw } - E{ 7l (X)e uw } 

= E[{ 1 (X)- ll {X)}e itx ]f^(t), 

we deduce that 773 vanishes almost everywhere if and only if 7 = 71 almost 
everywhere on the support of the distribution of X. However, the conditions 
imposed for part (c) of Theorem 1 rule this out, and so 7/3 is nonzero on a 
set of positive measure. 
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